Goto

Collaborating Authors

 unified gradient-descent clustering architecture


A Unified Gradient-Descent/Clustering Architecture for Finite State Machine Induction

Neural Information Processing Systems

Although recurrent neural nets have been moderately successful in learning to emulate finite-state machines (FSMs), the continu(cid:173) ous internal state dynamics of a neural net are not well matched to the discrete behavior of an FSM. We describe an architecture, called DOLCE, that allows discrete states to evolve in a net as learn(cid:173) ing progresses. DOLCE consists of a standard recurrent neural net trained by gradient descent and an adaptive clustering technique that quantizes the state space. DOLCE is based on the assumption that a finite set of discrete internal states is required for the task, and that the actual network state belongs to this set but has been corrupted by noise due to inaccuracy in the weights. DOLCE learns to recover the discrete state with maximum a posteriori probabil(cid:173) ity from the noisy state.


A Unified Gradient-Descent/Clustering Architecture for Finite State Machine Induction

Das, Sreerupa, Mozer, Michael C.

Neural Information Processing Systems

Researchers often try to understand-post hoc-representations that emerge in the hidden layers of a neural net following training. Interpretation is difficult because these representations are typically highly distributed and continuous. By "continuous," we mean that if one constructed a scatterplot over the hidden unit activity space of patterns obtained in response to various inputs, examination at any scale would reveal the patterns to be broadly distributed over the space.


A Unified Gradient-Descent/Clustering Architecture for Finite State Machine Induction

Das, Sreerupa, Mozer, Michael C.

Neural Information Processing Systems

Researchers often try to understand-post hoc-representations that emerge in the hidden layers of a neural net following training. Interpretation is difficult because these representations are typically highly distributed and continuous. By "continuous," we mean that if one constructed a scatterplot over the hidden unit activity space of patterns obtained in response to various inputs, examination at any scale would reveal the patterns to be broadly distributed over the space.


A Unified Gradient-Descent/Clustering Architecture for Finite State Machine Induction

Das, Sreerupa, Mozer, Michael C.

Neural Information Processing Systems

Researchers often try to understand-post hoc-representations that emerge in the hidden layers of a neural net following training. Interpretation is difficult because these representations are typically highly distributed and continuous. By "continuous," wemean that if one constructed a scatterplot over the hidden unit activity space of patterns obtained in response to various inputs, examination at any scale would reveal the patterns to be broadly distributed over the space.